AI023
Introduction to Triton Programming
Environment Setup and Identifying GPU Bottlenecks
Learning Objectives
- Configure and verify a production-ready GPU development environment using CUDA and ROCm.
- Execute system-wide profiling to map kernel execution timelines and resource utilization.
- Distinguish between compute-bound and memory-bound kernels using metrics and roofline models.
- Diagnose and mitigate PCIe data transfer overhead and host-to-device latency.